Keyword Selection Method for Characterizing Text Document Maps

نویسندگان

  • Krista Lagus
  • Samuel Kaski
چکیده

Characterization of subsets of data is a recurring problem in data mining. We propose a keyword selection method that can be used for obtaining characterizations of clusters of data whenever textual descriptions can be associated with the data. Several methods that cluster data sets or form projections of data provide an order or distance measure of the clusters. If such an ordering of the clusters exists or can be deduced, the method utilizes the order to improve the characterizations. The proposed method may be applied , for example, to characterizing graph-ical displays of collections of data ordered e.g. with the SOM algorithm. The method is validated using a collection of 10,000 sci-entiic abstracts from the INSPEC database organized on a WEBSOM document map.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm

Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...

متن کامل

Keyword selection method for characterizing

Characterization of subsets of data is a recurring problem in data mining. We propose a keyword selection method that can be used for obtaining characterizations of clusters of data whenever textual descriptions can be associated with the data. Several methods that cluster data sets or form projections of data provide an order or distance measure of the clusters. If such an ordering of the clus...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Directional Stroke Width Transform to Separate Text and Graphics in City Maps

One of the complex documents in the real world is city maps. In these kinds of maps, text labels overlap by graphics with having a variety of fonts and styles in different orientations. Usually, text and graphic colour is not predefined due to various map publishers. In most city maps, text and graphic lines form a single connected component. Moreover, the common regions of text and graphic lin...

متن کامل

A Cluster Based Keyword Filtration Approach for Web Document Summarization

Summarization, an extremely important technique in Data Mining is an automatic learning technique aimed to extract the most valuable information from a large size document or the articles. The goal is to create the summary of the document, but substantially different from each other. Text Document summarization refers to the summarization of text documents based upon their content. The proposed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008